feat telegram: add voice message support for telegram with pluggable by soufianebouaddis · Pull Request #38 · jobrunr/JavaClaw

soufianebouaddis · 2026-03-26T22:17:05Z

Add support for voice messages in Telegram channel

This PR extends TelegramChannel to handle voice messages in addition to text.

Changes

Refactored TelegramChannel.consume() to process both text and voice messages
Download voice messages from Telegram API
Introduced a pluggable SpeechToTextService abstraction
Transcribed text routed through existing agent.respondTo(...) flow

Transcription

Default: MockSpeechToTextService (no external dependency, suitable for testing)
Optional: OpenAiSpeechToTextService (enabled via speech.provider=openai)

Notes

Existing text message behavior remains unchanged
Tests updated and all passing

Next Steps / Ideas

Integrate real transcription providers (e.g., OpenAI Whisper, Spring AI AudioTranscriptionModel, or local Whisper plugin)
Open to feedback on aligning the abstraction with Spring AI’s AudioTranscriptionModel if preferred

…transcription

cla-bot · 2026-03-26T22:17:08Z

We require contributors to sign our Contributor License Agreement, and we don't have @soufianebouaddis on file. In order for us to review and merge your code, please create a PR where you add yourself to the contributors of JobRunr. This only needs to be done once. As soon as that is done, we can review your PR.

Thanks a lot!

rdehuyss · 2026-03-29T06:07:51Z

@cla-bot check

cla-bot · 2026-03-29T06:07:56Z

The cla-bot has been summoned, and re-checked this pull request!

…тестами для audit/executions/deliveries REST API endpoints (T58, T60) с реальным PostgreSQL через Testcontainers — все тесты зелёные, BUILD SUCCESS

…грация role_agent_config, RoleAgentConfig entity/repository/service с hierarchy fallback и кэшем, интеграция model override через весь pipeline (ChatRestController→SseStreamingService→ChatService), REST API endpoints для управления, 26 новых тестов, все 1147 тестов зелёные

auloin · 2026-04-15T08:32:47Z

Hi @soufianebouaddis thanks for submitting this PR. Sorry for the late review.

From what I can see the actual transcription is yet to be done. I think we should have at least one working implementation. Is this something you'd still like to work on?

My second concern is that we're mixing telegram text and voice messages, is it possible to find a nice abstraction?

soufianebouaddis · 2026-04-15T10:23:30Z

Hi @auloin, thanks for the review and sorry for the incomplete implementation.

I’ll continue working on this and add a concrete transcription provider so the feature works end-to-end. I’ll also revisit the current design to avoid mixing text and voice handling in TelegramChannel and introduce a cleaner abstraction for message types.

I’ll update the PR shortly with these changes. Thanks for the feedback!

soufianebouaddis · 2026-04-15T20:54:41Z

Hi @auloin, this update extending TelegramChannel.consume() to handle both text and voice inputs through a single flow. Voice messages are downloaded via TelegramVoiceDownloader, transcribed to text using a SpeechToTextService abstraction, and then passed to agent.respondTo() the same way as text messages.

I added working transcription implementations (local via whisper-cli + ffmpeg, and OpenAI), with a mock still available for testing. The flow normalizes everything to text before reaching the agent, so text and voice are no longer mixed beyond the input layer.

auloin · 2026-04-17T09:41:10Z

Thanks @soufianebouaddis. I'll review it as soon as possible. In the meantime could you already pull the main branch into your branch and solve the conflicts?

soufianebouaddis · 2026-04-17T10:49:15Z

Hi @auloin, thanks for the heads up I’ll pull the latest changes from main into my branch and resolve the conflicts shortly. I’ll push the updated version once everything is clean.

…solve conflicts

soufianebouaddis · 2026-04-22T18:17:58Z

Hi @auloin,
I ran into an issue when handling voice messages from Telegram. In some cases, the LLM response becomes too large, which leads to a failure when sending the message back.

Here are the relevant logs:
2026-04-22T19:01:36.487+01:00 INFO 80461 --- [JavaClaw] [pool-4-thread-1] a.j.channels.telegram.TelegramChannel : Voice message received, downloading audio
2026-04-22T19:01:37.191+01:00 INFO 80461 --- [JavaClaw] [pool-4-thread-1] a.j.s.WhisperCppSpeechToTextService : Transcribing audio via whisper-cpp (model: /Users/snof/whisper-models/ggml-small.bin)
2026-04-22T19:01:38.542+01:00 INFO 80461 --- [JavaClaw] [pool-4-thread-1] a.j.s.WhisperCppSpeechToTextService : whisper-cpp transcription completed successfully
2026-04-22T19:01:38.543+01:00 INFO 80461 --- [JavaClaw] [pool-4-thread-1] a.j.channels.telegram.TelegramChannel : Voice message transcribed successfully
2026-04-22T19:02:24.338+01:00 WARN 80461 --- [JavaClaw] [pool-4-thread-1] a.j.channels.telegram.TelegramChannel : Failed to send HTML parsed message, falling back to raw text.

Exception in thread "pool-4-thread-1" java.lang.RuntimeException: Failed to send both HTML and fallback messages
at ai.javaclaw.channels.telegram.TelegramChannel.sendMessage(TelegramChannel.java:136)
at ai.javaclaw.channels.telegram.TelegramChannel.consume(TelegramChannel.java:100)
at org.telegram.telegrambots.longpolling.util.LongPollingSingleThreadUpdateConsumer.lambda$consume$0(LongPollingSingleThreadUpdateConsumer.java:15)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
at java.base/java.lang.Thread.run(Thread.java:1474)
Caused by: Error executing org.telegram.telegrambots.meta.api.methods.send.SendMessage query: [400] Bad Request: message is too long
at org.telegram.telegrambots.meta.api.methods.botapimethods.PartialBotApiMethod.deserializeResponseInternal(PartialBotApiMethod.java:63)
at org.telegram.telegrambots.meta.api.methods.botapimethods.PartialBotApiMethod.deserializeResponse(PartialBotApiMethod.java:43)
at org.telegram.telegrambots.meta.api.methods.botapimethods.BotApiMethodMessage.deserializeResponse(BotApiMethodMessage.java:24)
at org.telegram.telegrambots.meta.api.methods.botapimethods.BotApiMethodMessage.deserializeResponse(BotApiMethodMessage.java:17)
at org.telegram.telegrambots.client.okhttp.OkHttpFutureCallback.onResponse(OkHttpFutureCallback.java:35)
at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:531)

Would you prefer that I implement message chunking to split long responses, or should we explore another approach?

auloin · 2026-04-27T15:59:29Z

Hi @auloin, I ran into an issue when handling voice messages from Telegram. In some cases, the LLM response becomes too large, which leads to a failure when sending the message back.

Would you prefer that I implement message chunking to split long responses, or should we explore another approach?

Interesting finding @soufianebouaddis. Does it happen often? Is it a blocker? I wonder if it can be a separate task so we can keep the scope of this PR small.

auloin

Thanks again for the work on this @soufianebouaddis. I think we're pretty close, I have a few remarks to see if we cannot simplify the implementation a bit.

auloin · 2026-04-27T16:09:15Z

+
+@Service
+@ConditionalOnProperty(name = "speech.provider", havingValue = "whisper-cpp")
+public class WhisperCppSpeechToTextService implements SpeechToTextService {


I've been looking for a java library that does speech to text and I found vosk: https://github.com/alphacep/vosk-api. If it works, what do you think of making it the default @soufianebouaddis? We could also drop this implementation which requires having both ffmpeg and whisper-cli.

soufianebouaddis · 2026-04-27T18:05:55Z

Hi @auloin, I ran into an issue when handling voice messages from Telegram. In some cases, the LLM response becomes too large, which leads to a failure when sending the message back.
Would you prefer that I implement message chunking to split long responses, or should we explore another approach?

Interesting finding @soufianebouaddis. Does it happen often? Is it a blocker? I wonder if it can be a separate task so we can keep the scope of this PR small.

…/speech/MockSpeechToTextService and refactor OpenAiSpeechToTextService to deletate it to SpringAI

soufianebouaddis · 2026-04-27T19:18:04Z

Hi @auloin, thanks again for the review.

I pushed a new revision addressing the first two remarks:

MockSpeechToTextService has been moved out of production code into src/test, since keeping it as a default runtime fallback was indeed misleading.
OpenAiSpeechToTextService was removed from base/ and reworked under providers/openai/ as a thin adapter over Spring AI’s AudioTranscriptionModel, which aligns better with the existing provider structure and avoids the custom HTTP/multipart handling.

I also spent some time looking into Vosk as an alternative to WhisperCppSpeechToTextService. It is definitely attractive from a portability standpoint since it removes the whisper-cli dependency, but Telegram voice messages still come in OGG format and Vosk expects WAV input, so an audio conversion step is still required unless we introduce an additional Java decoder.

As for the message is too long exception I hit during testing, it does not happen often and it is not related to voice handling itself it only occurs when the generated agent reply exceeds Telegram’s 4096 character limit, which can happen with regular text messages as well.

feat telegram: add voice message support for telegram with pluggable …

1d1514f

…transcription

cla-bot Bot added the cla-signed label Mar 29, 2026

Merge main into feature/add-voice-message-support-for-telegram and re…

21404e0

…solve conflicts

soufianebouaddis force-pushed the feature/add-voice-message-support-for-telegram branch from 33d5bf0 to 21404e0 Compare April 22, 2026 18:12

auloin reviewed Apr 27, 2026

View reviewed changes

soufianebouaddis closed this Apr 27, 2026

soufianebouaddis reopened this Apr 27, 2026

Move OpenAiSpeechToTextService.java to base/src/test/java/ai/javaclaw…

16edc98

…/speech/MockSpeechToTextService and refactor OpenAiSpeechToTextService to deletate it to SpringAI

Clean .gitignore

6b3d09b

auloin mentioned this pull request May 8, 2026

[BUG] Agent response is too large for Telegram #62

Open

Conversation

soufianebouaddis commented Mar 26, 2026

Add support for voice messages in Telegram channel

Changes

Transcription

Notes

Next Steps / Ideas

Uh oh!

cla-bot Bot commented Mar 26, 2026

Uh oh!

rdehuyss commented Mar 29, 2026

Uh oh!

cla-bot Bot commented Mar 29, 2026

Uh oh!

auloin commented Apr 15, 2026

Uh oh!

soufianebouaddis commented Apr 15, 2026

Uh oh!

soufianebouaddis commented Apr 15, 2026

Uh oh!

auloin commented Apr 17, 2026

Uh oh!

soufianebouaddis commented Apr 17, 2026

Uh oh!

soufianebouaddis commented Apr 22, 2026

Uh oh!

auloin commented Apr 27, 2026

Uh oh!

auloin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

auloin Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

soufianebouaddis commented Apr 27, 2026

Uh oh!

soufianebouaddis commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants